Measuring Semantic Coverage
نویسندگان
چکیده
The developlnent of natural language processing systems is currently driven to a large extent by measures of knowledgebase size and coverage of individual phenomena relative to a corpus. While these measures have led to significant advances for knowledge-lean applications, they do not adequately motivate progress in computat ional semantics leading to the development of large-scale, general purpose NLP systems. In this article, we argue tha t depth of semantic representat ion is essential for covering a broad range of phenomena in the computational t rea tment of language and propose (lepth as an impor tan t additional dimension for measuring the semantic coverage of NLP systems. We propose an operationalization of this measure and show how to characterize an NLP system along the dimensions of size, corpus coverage, and depth. The proposed framework is illustrated using sever~fl prominent NLP systems. We hope the preliminary proposals made in this article will lead to prolonged debates in the field and will continue to be refined. 1 Measures of Size versus Measures of Depth Evaluation of current and potential performance of' an NLP system or method is of crucial importance to researchers, developers and users. Current performance of systems is directly measured using a variety of tests and techniques. Often, as in the case of machine translation or information extraction, an entire "industry" of evaluation gets developed (see, for example, ARPA MT Evaluation; MUC-4 ). Measuring the performance of an NLP method, approach or technique (and through it the promise of a system based on it) is more difficult, as judgments must be made about "blame assigmnent" and the impact of improving a variety of system components on the overall future performance. One of the widely accepted measures of potential performance improvement is the feasibility of scaling up the static knowledge sources of an NLP system its grammars , lexicons, worht knowledge bases and other sets of language descriptions (the reasoning being that the larger the system's grammars and lexicons, the greater percentage of input they would be able to match and, therefore, the better the performance of the systeml) . As a result, a system would be considered very promising if its knowledge sources could be significantly scaled up at a reasonable expense. Natm'ally, the expense is lowest if acquisition is performed automatically. This consideration and the recent resurgence of corpus-based methods heighten the interest in the automat ion of knowledge acquisition, llowever, we believe that such acquisition should not 1)e judged solely by the utility of acquired knowledge for ~ particular application. A preliminary to the sealability estimates is a judgment of the current coverage of a system's static knowledge sources. Unfortunately, judgments based purely on size ace often misleading. While they may be sufficiently straightforward for less km)wledgeAntensive methods used in such applications as information extraction and retrieval, part of speech tagging, bilingual corpus alignment, and so on, the saute is not true about more ruleand knowledge-based methods (such as syntactic parsers, semantic analyzers, semantic lexicons, ontological world models, etc.). It ix widely accepted, for instance, that judgments of the coverage of a syntactic g rammar in terms of the number of rules are tlawed. It is somewhat less self-evident, however, that the number of lexicon entries or ontology concepts is not an adequate measure of the quality or coverage of NLP a Incidentally, this consideration eontributes to evMuation of current perforntance as well. In the absence of actual evaluation results, it is customary to c|aim the utility of the system by simply mentioning tit(: size of its knowledge sources (e.g., "over 550 grammar rules, over 50,000 concepts in the ontology and over 100,00(I word senses in the dictionary").
منابع مشابه
Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملExploitation of semantic similarity for adaptation of existing terminologies within biomedical area
We present a novel method for adaptation of existing terminologies. Within biomedical domain and when no textual corpora for building terminologies are available, we exploit UMLS metathesaurus which merges over a hundred existing biomedical terminologies and ontologies. We exploit also algorithms for measuring the semantic similarity in order to limit, within UMLS, a semantically homogeneous sp...
متن کاملLODDO: Using Linked Open Data Description Overlap to Measure Semantic Relatedness between Named Entities
Measuring semantic relatedness plays an important role in information retrieval and Natural Language Processing. However, little attention has been paid to measuring semantic relatedness between named entities, which is also very significant. As the existing knowledge based approaches have the entity coverage issue and the statistical based approaches have unreliable result to low frequent enti...
متن کاملA Simple Metric to Measure Semantic Overlap between Models: Application and Visualization
This paper investigates a fairly simple but easily automatable metric for measuring the semantic overlap between models as a proxy to the degree of overlap between their domain coverage. Such a metric is very useful when evaluating competing models with fully or partially overlapping domains, be it for purposes of model integration, re-use or selection. The proposed metric is based on the seman...
متن کاملBuilding Semantic Networks from Plain Text and Wikipedia with Application to Semantic Relatedness and Noun Compound Paraphrasing
The construction of suitable and scalable representations of semantic knowledge is a core challenge in Semantic Computing. Manually created resources such as WordNet have been shown to be useful for many AI and NLP tasks, but they are inherently restricted in their coverage and scalability. In addition, they have been challenged by simple distributional models on very large corpora, questioning...
متن کاملA Probabilistic Model for Measuring Grammaticality and Similarity of Automatically Generated Paraphrases of Predicate Phrases
The most critical issue in generating and recognizing paraphrases is development of wide-coverage paraphrase knowledge. Previous work on paraphrase acquisition has collected lexicalized pairs of expressions; however, the results do not ensure full coverage of the various paraphrase phenomena. This paper focuses on productive paraphrases realized by general transformation patterns, and addresses...
متن کامل